BTech Lab · Pandas Concepts

Pandas Essentials
for the Lab

All the core concepts you need to understand before writing code for Experiments 4, 5 & 6

Exp 4 · Array Powers Exp 5 · First N Rows Exp 6 · Missing Values

Use ← → keys or the buttons below to navigate

Session Overview

What We Cover Today

01

NumPy & Pandas Basics

Libraries, imports, array creation

02

DataFrame Creation

From dict, column/index types

03

Element-wise Powers

np.power, ** operator

04

Selecting Rows

head(), tail(), loc, iloc

05

Missing Values (NaN)

What NaN is, how it arises

06

Handling Missing Data

isnull, fillna, dropna

Libraries

Importing NumPy & Pandas

Before writing any data code, you must import the two powerhouse libraries. These are already installed in your lab environment.

import numpy as np    # array math, element-wise ops
import pandas as pd   # DataFrames & Series
🔢

NumPy

N-dimensional arrays, mathematical functions like np.power. The backbone of scientific Python.

🐼

Pandas

Built on top of NumPy. Provides DataFrame (2-D table) and Series (1-D column) — your main tools today.

Convention Matters

Always use np and pd as aliases — every textbook, tutorial and StackOverflow answer uses them.

NumPy

NumPy Arrays (ndarray)

A NumPy array is a grid of same-type values. Unlike a Python list, every operation on it works element-by-element automatically.

a = np.array([2, 3, 4])
b = np.array([1, 2, 3])

print(a + b)   # [3, 5, 7]  ← adds index-by-index
print(a * b)   # [2, 6, 12]
print(a ** b)  # [2, 9, 64] ← element-wise power!

Key property: Broadcasting

An operation between two same-shape arrays is done position by position. No loops needed.

dtype

All elements share one data type (int64, float64…). Pandas inherits this for DataFrame columns.

Exp 4 · Core Concept

np.power() — Element-wise Raise

np.power(x1, x2) raises each element of x1 to the corresponding element of x2.

np.power(x1, x2)
# x1 → base array  |  x2 → exponent array
# result[i] = x1[i] ** x2[i]  for every i

Concrete Example

bases  = np.array([2, 3, 4])
exps   = np.array([3, 2, 1])
result = np.power(bases, exps)
# result → [8, 9, 4]
#           2³ 3² 4¹
  • Works on 1-D arrays, 2-D arrays, and entire DataFrame columns
  • ** operator does the same thing: a ** b equals np.power(a,b)
  • Both arrays must be the same shape (or broadcastable)
Exp 4 & 5 · Core Concept

Creating a DataFrame

A DataFrame is a 2-D labelled table — think of it as an Excel sheet inside Python. The most common way to build one is from a dictionary.

data = {
    'X': [78, 85, 96],
    'Y': [84, 94, 89],
    'Z': [86, 97, 96],
}
df = pd.DataFrame(data)
print(df)
XYZ
0788486
1859497
2968996

Dictionary keys → Columns

Each key becomes a column name. All lists must be equal length.

Auto Index (0, 1, 2…)

Pandas assigns a numeric row index unless you provide custom labels.

Exp 5 · Core Concept

Custom Row Labels (index=)

For Exp-5 the DataFrame uses letter labels instead of numbers. Pass your list of labels using the index= parameter.

labels = ['a', 'b', 'c', 'd', 'e',
           'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data, index=labels)

Why custom labels matter

  • Row labels become the index used by loc[]
  • Numeric position is still used by iloc[] (0-based, always)
  • Labels survive sorting, filtering, and merging — numbers don't

np.nan in data

When building the dictionary, use np.nan for missing entries. This requires importing NumPy first. It is the standard float sentinel for "no value".

Exp 5 · Core Concept

Viewing Rows: head() & tail()

These are the two most-used methods for peeking at a DataFrame quickly.

df.head(n)

Returns the first n rows. Default is 5 if you omit the argument.

df.head(3)   # rows a, b, c
df.head()    # rows a–e (first 5)
df.tail(n)

Returns the last n rows. Also defaults to 5.

df.tail(3)   # rows h, i, j
df.tail()    # rows f–j (last 5)

Analogy

Think of head/tail like a newspaper — head shows the headline section, tail shows the classifieds at the back. Both give you a slice without changing the original.

Row Selection

loc vs iloc — Know the Difference

Two indexers that look similar but behave differently:

df.loc[label]

Label-based. Use the actual index labels ('a', 'b'…). End of slice is inclusive.

df.loc['a':'c']
# rows a, b, c  ✓
df.iloc[position]

Position-based. Always 0-indexed integers. End of slice is exclusive.

df.iloc[0:3]
# positions 0,1,2  ✓
IndexerWorks withSlice endExample
locIndex labelsInclusivedf.loc['a':'c']
ilocInteger positionExclusivedf.iloc[0:3]
NaN
Exp 6 · Missing Values

What is a Missing Value?

Next up: understanding why missing values appear, how to detect them, and the three strategies to handle them.

Exp 6 · Core Concept

Understanding NaN

NaN = "Not a Number". In Pandas, it represents a missing or undefined value in a floating-point column.

NaN

It is a float

Pandas uses Python's float('nan') from IEEE-754. It propagates: NaN + 5 = NaN

📭

Where it comes from

Missing survey answers, failed sensor readings, unfilled form fields, data import errors.

⚠️

Why it matters

Most statistical functions skip or miscount NaN. You must decide what to do with it before analysis.

NaN ≠ 0  |  NaN ≠ empty string  |  NaN ≠ None

All three look "empty" but are treated differently by Pandas. isnull() catches only NaN and None (Python's null). Zero and empty strings are valid values.

Exp 6 · Core Concept

Detecting Missing Values

Before you fix missing data, you must find it. Pandas gives you several tools:

df.isnull()        # True where NaN, False elsewhere
df.isna()           # alias — exactly the same
df.notnull()        # inverse: True where value exists

df.isnull().sum()   # count NaNs per column ★ most useful
df.isnull().any()   # True/False per column: any NaN?

Output of isnull()

Returns a boolean DataFrame — same shape, but each cell is True/False. You can chain with .sum() to count per column.

Exam-data example

In Exp-5 data, score has 2 NaNs (rows d & h). Running df['score'].isnull().sum() returns 2.

Exp 6 · Core Concept

fillna() — Replace Missing Values

df.fillna(value) substitutes every NaN with a value you choose. It returns a new DataFrame by default.

# Fill with a fixed number
df.fillna(0)

# Fill with column mean (very common in data science)
df['score'].fillna(df['score'].mean())

# Fill forward (use previous row's value)
df.fillna(method='ffill')

# Fill backward (use next row's value)
df.fillna(method='bfill')

inplace=True

By default fillna returns a copy. Add inplace=True to modify the original DataFrame directly: df.fillna(0, inplace=True)

Exp 6 · Core Concept

dropna() — Remove Missing Rows

df.dropna() removes any row (or column) that contains at least one NaN.

# Drop every row that has ANY NaN
df.dropna()

# Drop only rows where ALL values are NaN
df.dropna(how='all')

# Drop columns instead of rows
df.dropna(axis=1)

# Keep rows that have at least 3 non-NaN values
df.dropna(thresh=3)

When to drop

Only drop when the missing rows are few and random. Dropping too many rows can introduce bias into your analysis.

When to fill

Fill when you have a sensible substitute (mean, 0, the next known value). Filling preserves dataset size.

Exp 6 · Strategy Guide

Choosing the Right Strategy

Situation Best Approach Method
No meaningful substitute exists, row is mostly empty Drop the row dropna()
Numerical column, want to preserve size Fill with mean/median fillna(mean)
Categorical column (Yes/No) Fill with mode fillna(mode[0])
Time-series / ordered data Forward or backward fill fillna(method='ffill')
Replacement value is known (e.g., 0) Fill with constant fillna(0)

Pro tip: Always inspect with isnull().sum() before and after any fill or drop to confirm your operation worked as expected.

Quick Reference

Essential DataFrame Attributes

These attributes let you inspect a DataFrame without printing the whole thing:

df.shape      # (rows, cols) — e.g. (10, 4)
df.dtypes     # data type of each column
df.columns    # list of column names
df.index      # row labels
df.info()     # summary + non-null counts ★
df.describe() # stats: mean, std, min, max…

df.info()

Shows column names, non-null count, and dtype. The fastest way to spot missing data at a glance.

df.describe()

Gives count, mean, std, min, percentiles, max for numeric columns. NaN rows are excluded from count.

df.shape

A tuple — first number is rows, second is columns. No parentheses — it's a property, not a method.

Summary

Concepts Covered Today

1

Import NumPy & Pandas

import numpy as np  import pandas as pd

2

Element-wise Power (Exp 4)

np.power(x1, x2) — raises each element of x1 to corresponding element of x2

3

DataFrame from dict + custom index (Exp 5)

pd.DataFrame(data, index=labels) + df.head(n)

4

Detect Missing Values (Exp 6)

isnull()  /  isna()  /  isnull().sum()

5

Handle Missing Values (Exp 6)

fillna(value) to replace  |  dropna() to remove

🎯 Remember: Understand the concept first — then the code writes itself.

Pandas Notebook

Run the Pandas Notebook

Open the accompanying Pandas.ipynb notebook directly in Google Colab — no installation needed, runs entirely in your browser.

Open in Google Colab
📓 Pandas.ipynb ☁️ Runs in Cloud 🐍 Python 3

Tip: Sign in with your Google account to save your work in Google Drive.

Use arrow keys or swipe to navigate